Indexing HDFS Data in PDW: Splitting the data from the index

نویسندگان

  • Vinitha Reddy Gankidi
  • Nikhil Teletia
  • Jignesh M. Patel
  • Alan Halverson
  • David J. DeWitt
چکیده

There is a growing interest in making relational DBMSs work synergistically with MapReduce systems. However, there are interesting technical challenges associated with figuring out the right balance between the use and co-deployment of these systems. This paper focuses on one specific aspect of this balance, namely how to leverage the superior indexing and query processing power of a relational DBMS for data that is often more cost-effectively stored in Hadoop/HDFS. We present a method to use conventional B+-tree indices in an RDBMS for data stored in HDFS and demonstrate that our approach is especially effective for highly selective queries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Zero-Overhead Adaptive Indexing in Hadoop

Several research works have focused on supporting index access in MapReduce systems. These works have allowed users to significantly speed up selective MapReduce jobs by orders of magnitude. However, all these proposals require users to create indexes upfront, which might be a difficult task in certain applications (such as in scientific and social applications) where workloads are evolving or ...

متن کامل

Does the Platelet Index Have a Guiding Role in the Association of Cancer and Pulmonary Thromboembolism?

Introduction: The diagnostic value of the D-dimer test varies with variable platelet numbers and functions in patients suffering from cancer and concomitant pulmonary thromboembolism (PTE). This requires easy and reliable evaluation tests. In this study, we aimed to investigate the hypothesis that platelet functions may be more guiding in the prediction and diagnosis of PTE rather than the numb...

متن کامل

Design Architecture-Based on Web Server and Application Cluster in Cloud Environment

Cloud has been a computational and storage solution for many data centric organizations. The problem today those organizations are facing from the cloud is in data searching in an efficient manner. A framework is required to distribute the work of searching and fetching from thousands of computers. The data in HDFS is scattered and needs lots of time to retrieve. The major idea is to design a w...

متن کامل

Design Anefficient Bigdata Analytic Architecture Forretrieval Ofdatabased on Web Server Incloudenvironment

Cloud has been a computational and storage solution for many data centric organizations. The problem today those organizations are facing from the cloud is in data searching in an efficient manner. A framework is required to distribute the work of searching and fetching from thousands of computers. The data in HDFS is scattered and needs lots of time to retrieve. The major idea is to design a w...

متن کامل

Does the Platelet Index Have a Guiding Role in the Association of Cancer and Pulmonary Thromboembolism?

Introduction: The diagnostic value of the D-dimer test varies with variable platelet numbers and functions in patients suffering from cancer and concomitant pulmonary thromboembolism (PTE). This requires easy and reliable evaluation tests. In this study, we aimed to investigate the hypothesis that platelet functions may be more guiding in the prediction and diagnosis of PTE rather than the numb...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2014